Pádraig Cunningham, Marc van Dongen,
نویسندگان
چکیده
In this paper, we present a novel system which automat-ically converts text documents into XML by using machine-learningtechniques. In the first phase, the system uses the Self-OrganizingMap (SOM) algorithm to arrange marked-up documents on a two-dimensional map such that the documents similar in content appearcloser to each other. In the second phase, it then uses the inductivelearning algorithm C5.0 to automatically extract and apply markupinformation (in the form of rules) from the nearest SOM neighboursof an unmarked document. The system is designed to have an adap-tive behaviour, so that once a document is marked-up into XML, itlearns from its errors to improve accuracy. The resulting marked-updocument is again categorized on the SOM. The results of our ex-periments with a number of document sets from different domains,indicate that our approach is practical.
منابع مشابه
A Theoretical Analysis of the Average-Time Complexity of Domain-Heuristics for Arc-Consistency Algorithms
متن کامل
Network of Excellence Multimedia Understanding through Semantics, Computation and LEarning
We present an automatic focus area estimation method, working with a single image without a priori information about the image, the camera, or the scene. It produces relative focus maps by localized blind deconvolution and a new residual error-based classification. Evaluation and comparison is performed and applicability is shown through image indexing.
متن کاملFIONN: A Framework for Developing CBR Systems
Case-Based Reasoning (CBR) is a very popular methodology for developing knowledge-based systems [1]. Yet there are few toolkits available for building CBR systems. In this paper we present a framework called Fionn that is specifically designed for the development of CBR systems. Since Fionn was designed specifically for CBR it provides good support for some of the unique characteristics of CBR ...
متن کامل